{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YWjR-pgsd-Si"
      },
      "source": [
        "# CS640 Homework 2: Neural Network\n",
        "\n",
        "In this assignment, you will\n",
        "\n",
        "1. derive both forward and backward propagation,\n",
        "2. implement a neural network from scratch, and\n",
        "3. run experiments with your model.\n",
        "\n",
        "### Collaboration\n",
        "You are allowed to work in a team of at most **three** on the coding part(**Q2**), but you must run the experiments and answer written questions independently.\n",
        "\n",
        "## Instructions\n",
        "\n",
        "### General Instructions\n",
        "In an ipython notebook, to run code in a cell or to render [Markdown](https://en.wikipedia.org/wiki/Markdown)+[LaTeX](https://en.wikipedia.org/wiki/LaTeX) press `Ctrl+Enter` or `[>|]`(like \"play\") button above. To edit any code or text cell (double) click on its content. To change cell type, choose \"Markdown\" or \"Code\" in the drop-down menu above.\n",
        "\n",
        "Most of the written questions are followed up a cell for you enter your answers. Please enter your answers in a new line below the **Answer** mark. If you do not see such cell, please insert one by yourself. Your answers and the questions should **not** be in the same cell.\n",
        "\n",
        "### Instructions on Math\n",
        "Some questions require you to enter math expressions. To enter your solutions, put down your derivations into the corresponding cells below using LaTeX. Show all steps when proving statements. If you are not familiar with LaTeX, you should look at some tutorials and at the examples listed below between \\$..\\$. The [OEIS website](https://oeis.org/wiki/List_of_LaTeX_mathematical_symbols) can also be helpful.\n",
        "\n",
        "Alternatively, you can scan your work from paper and insert the image(s) in a text cell.\n",
        "\n",
        "## Submission\n",
        "Once you are ready, save the note book as PDF file (File -> Print -> Save as PDF) and submit via Gradescope."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O9ImiOu7jzNu"
      },
      "source": [
        "## Q0: Name(s)\n",
        "\n",
        "Please write your name in the next cell. If you are collaborating with someone, please list their names as well."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R4ldlvqYkyKw"
      },
      "source": [
        "**Answer**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2NkWp9N_jDa0"
      },
      "source": [
        "## Q1: Written Problems\n",
        "\n",
        "Consider a simple neural network with three layers: an input layer, a hidden layer, and an output layer.\n",
        "\n",
        "Let $w^{(1)}$ and $w^{(2)}$ be the layers' weight matrices and let $b^{(1)}$ and $b^{(2)}$ be their biases. For convention, suppose that $w_{ij}$ is the weight between the $i$th node in the previous layer and the $j$th node in the current one.\n",
        "\n",
        "Additionally, the activation function for both layers is the sigmoid function $\\sigma(x) = \\frac{1}{1 + e^{-x}}$. Let $z^{(1)}$ and $z^{(2)}$ be the outputs of the two layers before activation, and let $a^{(1)} = \\sigma(z^{(1)})$ and $a^{(2)} = \\sigma(z^{(2)})$.\n",
        "\n",
        "Lastly, we choose the L2 loss $L(y_{\\text{true}}, y_{\\text{predict}}) = \\frac{1}{2}(y_{\\text{true}} - y_{\\text{predict}})^{2}$ as the loss function.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "swVLLpgYmtxv"
      },
      "source": [
        "### Q1.1: Forward Pass\n",
        "Suppose that\n",
        "\n",
        "$w^{(1)} = \\begin{bmatrix}0.4 & 0.6 & 0.2 \\\\ 0.3 & 0.9 & 0.5\\end{bmatrix}$,\n",
        " $b^{(1)} = [1, 1, 1]$; and\n",
        "\n",
        "$w^{(2)} = \\begin{bmatrix}0.2 \\\\ 0.2 \\\\ 0.8\\end{bmatrix}$, $b^{(2)} = [0.5]$.\n",
        "\n",
        "If the input is $a^{(0)} = \\begin{bmatrix}1 \\\\ 1\\end{bmatrix}$, what is the network output? Show your calculation steps and round your answer to 4 decimals."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MmZ7isA1s9rT"
      },
      "source": [
        "**[Answer]**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Rw_yNYW4t2Z0"
      },
      "source": [
        "### Q1.2: Backward Propagation\n",
        "\n",
        "Use the chain rule to derive the expressions of the following gradients:\n",
        "1. $\\frac{\\partial L}{\\partial w^{(2)}}$ and $\\frac{\\partial L}{\\partial b^{(2)}}$\n",
        "2. $\\frac{\\partial L}{\\partial w^{(1)}}$ and $\\frac{\\partial L}{\\partial b^{(1)}}$\n",
        "\n",
        "Your final answers should only include the variables appeared in the question.\n",
        "\n",
        "*Hint #1*: Begin by writing down the chain of partial derivatives, and then plug in predefined variables.\n",
        "\n",
        "*Hint #2*: While plugging in predefined variables, be careful about the dimensions and orientation. You can first write down the expressions in the element level and then figure out the matrix form.\n",
        "\n",
        "*Hint #3*: The derivative of $\\sigma(x)$ is $\\sigma(x)(1 - \\sigma(x))$.\n",
        "\n",
        "*Hint #4*: The LaTex code for dot product and element-wise product: $\\cdot$ and $\\odot$."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y9AkrYYVLnRl"
      },
      "source": [
        "**[Answer]**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "H6AxRuFOlU_B"
      },
      "source": [
        "## Q2: Implementation\n",
        "\n",
        "In this part, you need to construct a neural network model (almost) from scratch, run experiments, and write reports. We provide a script of skeleton code as well as three datasets.\n",
        "\n",
        "Your tasks are the following.\n",
        "1. Build your network model following the instruction.\n",
        "2. Run experiments and produce results.\n",
        "3. Interpret and discuss your results."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ce5gxnBCm7HW"
      },
      "source": [
        "### Q2.1: Import Packages\n",
        "\n",
        "The packages that have been imported in the following block should be sufficient for this assignment, but you are free to add more if necessary. However, keep in mind that you **should not** import and use any neural network package. If you have concern about an addition package, please contact us via Piazza."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rLhHI0rOdQJ0"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "from sklearn.model_selection import StratifiedKFold\n",
        "from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score\n",
        "import matplotlib.pyplot as plt\n",
        "from matplotlib.ticker import MaxNLocator\n",
        "import pandas"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N9TiTFf9psFl"
      },
      "source": [
        "### Q2.2: Define Activation and Loss Functions\n",
        "\n",
        "Complete the following functions. The ones starting with a \"d\" are the derivatives of the corresponding functions.\n",
        "\n",
        "Definitions:\n",
        "1. sigmoid: $\\sigma(x) = \\frac{1}{1 + e^{-x}}$\n",
        "2. softmax: softmax(x) $= \\frac{e^{x_{i}}}{\\sum_{i} e^{x_{i}}}$\n",
        "3. L2 loss: $L(y_{\\text{true}}, y_{\\text{predict}}) = \\frac{1}{2}(y_{\\text{true}} - y_{\\text{predict}})^{2}$\n",
        "4. cross entropy loss: $L(y_{\\text{true}}, y_{\\text{predict}}) = -\\sum_{i}y_{\\text{true}}[i]\\cdot\\log y_{\\text{predict}}[i]$"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yuxWGvbhp5jD"
      },
      "outputs": [],
      "source": [
        "def sigmoid(x):\n",
        "    pass\n",
        "\n",
        "def d_sigmoid(x):\n",
        "    pass\n",
        "\n",
        "def softmax(x):\n",
        "    pass\n",
        "\n",
        "def l2_loss(YTrue, YPredict):\n",
        "    pass\n",
        "\n",
        "def d_l2_loss(YTrue, YPredict):\n",
        "    pass\n",
        "\n",
        "def cross_entropy_loss(YTrue, YPredict):\n",
        "    pass\n",
        "\n",
        "def d_cross_entropy_softmax(YTrue, YPredict):\n",
        "    pass"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zvhPU4YqoVZF"
      },
      "source": [
        "### Q2.3: Define the Layer Class\n",
        "\n",
        "Complete the `initialize_weights` function, which initializes the weights and biases with small random values. The `__init__` function should be left as it is.\n",
        "\n",
        "*Hint*: It is recommended that you define weights and bias separately for clarity."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6r5h0hT8ofqQ"
      },
      "outputs": [],
      "source": [
        "class Layer:\n",
        "    def __init__(self, n_input, n_output, bias = True):\n",
        "        self.n_input = n_input\n",
        "        self.n_output = n_output\n",
        "        self.bias = bias\n",
        "        self.initialize_weights()\n",
        "\n",
        "    def initialize_weights(self):\n",
        "        \"\"\"\n",
        "        Initializes the weights and biases with small random values.\n",
        "        \"\"\"\n",
        "        rng = np.random.default_rng(2) # for re-producibility, do not change this\n",
        "        ########################## start of your code ##########################\n",
        "\n",
        "        ########################## end of your code ############################\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fXcQsl_XpAOW"
      },
      "source": [
        "### Q2.4: Define the Network Class\n",
        "\n",
        "Complete the `fit` and `predict` functions as instructed in the comments. Do not change their input arguments, but you are free to add functions as necessary. The `__init__` function should be left as it is.\n",
        "\n",
        "*Hint \\#1*: This is the heaviest part of this assignment. We recommend you to first go over the math carefully before starting this part.\n",
        "\n",
        "*Hint \\#2*: You are strongly encouraged to use numpy for matrix operations. When doing multiplication, please be careful about the dimensions, as well as the difference between the \"\\*\" operator, numpy's `multiply` function, and numpy's `dot` function."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-p1xdHw0pKr8"
      },
      "outputs": [],
      "source": [
        "class Network:\n",
        "    def __init__(self, layers, activation_list, d_activation_list, loss_function, d_loss_function):\n",
        "        self.layers = layers\n",
        "        self.activation_list = activation_list\n",
        "        self.d_activation_list = d_activation_list\n",
        "        self.loss_function = loss_function\n",
        "        self.d_loss_function = d_loss_function\n",
        "\n",
        "    def fit(self, X, Y, learning_rate, reg_lambda):\n",
        "        \"\"\"\n",
        "        This is the training function. It should return the average loss over samples.\n",
        "        \"\"\"\n",
        "        loss, n_sample = 0, len(X)\n",
        "\n",
        "        ########################## start of your code ##########################\n",
        "        # first, initialize zero gradients\n",
        "\n",
        "\n",
        "        # next, for each sample,\n",
        "        # 1. compute outputs from each layer (via some forward function);\n",
        "        # 2. compute and accumulate the loss (via the self.loss_function); and\n",
        "        # 3. compute and accumulate the gradients (via some backprog function)\n",
        "\n",
        "\n",
        "        # then, update weights and biases using the corresponding gradients\n",
        "        # don't forget to take the mean before updating\n",
        "\n",
        "        ########################## end of your code ############################\n",
        "\n",
        "        # lastly, return the average loss\n",
        "        return loss / n_sample\n",
        "\n",
        "    def predict(self, X, threshold = None):\n",
        "        \"\"\"\n",
        "        This function predicts the labels for samples in X. The parameter threshold\n",
        "        is used when the labels are binary and there is only one node in the final\n",
        "        layer of the network.\n",
        "        \"\"\"\n",
        "        YPredict = []\n",
        "\n",
        "        ######################### start of your code ###########################\n",
        "        # for each sample, run a forward pass and append the predicted label to YPredict\n",
        "\n",
        "        ######################### end of your code #############################\n",
        "\n",
        "        # return as a numpy array\n",
        "        return np.array(YPredict)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3NS8f9qz0fSK"
      },
      "source": [
        "### Q2.5: Test Model\n",
        "\n",
        "Use the following example code to test your model with some simple data.\n",
        "\n",
        "**Make sure to produce a decreasing loss curve here before moving on.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gTyLlzEa0a0U"
      },
      "outputs": [],
      "source": [
        "from sklearn import datasets\n",
        "\n",
        "X, Y = datasets.load_iris(return_X_y = True)\n",
        "X, Y = X[:100, :2], Y[:100]\n",
        "rng = np.random.default_rng(2)\n",
        "indices = [i for i in range(100)]\n",
        "rng.shuffle(indices)\n",
        "X, Y = X[indices], Y[indices]\n",
        "\n",
        "\n",
        "# assemble your model\n",
        "layers = [Layer(2, 4), Layer(4, 1)]\n",
        "model = Network(layers, [sigmoid, sigmoid], [d_sigmoid, d_sigmoid], l2_loss, d_l2_loss)\n",
        "\n",
        "# specify training parameters\n",
        "epochs = 100\n",
        "learning_rate = 1e-2\n",
        "reg_lambda = 0\n",
        "\n",
        "# capture the loss values during training\n",
        "loss = np.zeros(epochs)\n",
        "\n",
        "# start training\n",
        "for epoch in range(epochs):\n",
        "    loss[epoch] = model.fit(X, Y, learning_rate, reg_lambda)\n",
        "\n",
        "# plot the losses, the curve should be decreasing\n",
        "plt.plot([i for i in range(epochs)], loss)\n",
        "plt.title(\"Training Loss\")\n",
        "plt.xlabel(\"Epoch\")\n",
        "plt.show()\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vI0x356RyO0h"
      },
      "source": [
        "## Q3: Real Data Experiments with Dataset 1\n",
        "\n",
        "In this section, you will implement experiments with dataset1. There are two subsets in this dataset: linearly and nonlinearly.\n",
        "\n",
        "For each subset, your tasks are the following:\n",
        "1. Split it using [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html) with K = 5. Make sure the splitting is **random** (preferrably seeded).\n",
        "2. For each split, perform training and test with an instance of your model.\n",
        "3. Compute the **confusion matrix**. The values should be ***accumulated*** across all folds.\n",
        "4. Compute the **performance results**: accuracy, precision, recall, and F1. The values should the ***average*** across all folds.\n",
        "\n",
        "Please show the results clearly (one item at a time)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NfMeuvrO3RaU"
      },
      "source": [
        "### Q3.1: LinearXY\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gDONUUH37ooO"
      },
      "outputs": [],
      "source": [
        "# write your code in this block"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SxFyc5kI8D-I"
      },
      "source": [
        "### Q3.2: NonLinearXY\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uWAnLbvb9Kdz"
      },
      "outputs": [],
      "source": [
        "# write your code in this block"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jGasEudB__Dh"
      },
      "source": [
        "## Q4: Real Data Experiments with Dataset 2\n",
        "\n",
        "Dataset2 has been split into training and test subsets, so you only need to load them accordingly.\n",
        "\n",
        "In this part, you need to try out different model parameter values and observe how they affect the results.\n",
        "\n",
        "For each of the questions below, show **performance results** as four lists. An eample output is the following:\n",
        "\n",
        "`accuracy scores: [1., 1., 1., 1.]`\n",
        "\n",
        "`precision scores: [1., 1., 1., 1.]`\n",
        "\n",
        "`recall scores: [1., 1., 1., 1.]`\n",
        "\n",
        "`f1 scores: [1., 1., 1., 1.]`\n",
        "\n",
        "Use the following function to obtain one-hot encoded labels. Note that the returned labels are by default **row vectors**."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.preprocessing import OneHotEncoder\n",
        "\n",
        "# Simply pass the labels as two 1D arrays.\n",
        "def one_hot_encode(YTrain, YTest):\n",
        "    encoder = OneHotEncoder(sparse_output = False)\n",
        "    return encoder.fit_transform(YTrain.reshape(-1, 1)), encoder.transform(YTest.reshape(-1, 1))"
      ],
      "metadata": {
        "id": "ejjmpRMQswly"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UyItn83F-tBV"
      },
      "source": [
        "### Q4.1: Epochs\n",
        "\n",
        "Experiment with at least **5** different choices of total epochs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "X6cKLcwf_K7D"
      },
      "outputs": [],
      "source": [
        "# write your code in this block"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JwDcgfbpDHF4"
      },
      "source": [
        "### Q4.2: Learning Rate\n",
        "\n",
        "Experiment with at least **5** different choices of learning rates."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LEMZuxy9DRAg"
      },
      "outputs": [],
      "source": [
        "# write your code in this block"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "96i2ZAsBDTaZ"
      },
      "source": [
        "### Q4.3: Regularization Parameter\n",
        "\n",
        "Experiment with at least **3** different choices of regularization parameter."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ANhxVaLcDfjn"
      },
      "outputs": [],
      "source": [
        "# write your code in this block"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N_LjwNPkDgyq"
      },
      "source": [
        "### Q4.4: Network Structure\n",
        "\n",
        "Experiment with at least **5** different choices of network structure. This includes number of layers and number of nodes in each layer.\n",
        "\n",
        "*Hint*: Try experimenting with increasing complexity."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zae3B1jqFhHd"
      },
      "outputs": [],
      "source": [
        "# write your code in this block"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pmRKZaJPFpm1"
      },
      "source": [
        "## Q5: Follow-up Questions\n",
        "\n",
        "For each question below, provide a short answer. You can cite your code if needed."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MPFklnbuAwg_"
      },
      "source": [
        "### Q5.1: Briefly describe the workflow of how your model classifies the data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q8RYC8yiBEvU"
      },
      "source": [
        "**[Answer]**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZJ6k2vbJBFcU"
      },
      "source": [
        "### Q5.2: In your own words, explain how the forward propagation in your model works."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eZSqrQX-BofT"
      },
      "source": [
        "**[Answer]**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xDkm7LCOBpOZ"
      },
      "source": [
        "### Q5.3: In your own words, explain how the backward propagation in your model works."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WDgKWePgHHpB"
      },
      "source": [
        "**[Answer]**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "u4v_t-JdB726"
      },
      "source": [
        "### Q5.4: In theory, how do the total number of epochs, the learning rate, and the regularization parameter impact the performance of model? Does any of the theoretical impact actually happen in your result? If so, point them out."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1zFnMPEJG53C"
      },
      "source": [
        "**[Answer]**"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}